An examination of the effect of discretization on a naïve Bayes model's performance

نویسندگان

  • Arezoo Aghaei Chadegani
  • Davood Poursina
چکیده

A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. Some researches often involve continuous random variables. In order to apply these continuous variables to BN models, these variables should convert into discrete variables with limited states, often two. During the discretization process, one problem that researchers faced is to decide the number of states for discretization. Does the number of states chosen for discretization impact models’ power? In this study, this issue is examined empirically. The study examines this issue in the financial distress prediction field. The sample consists of 144 firms listed in Tehran stock exchange from 1997 to 2007. In order to develop Naïve Bayes models, two methods for choosing variables were used. The first method is based upon conditional correlation between variables and the second method is based upon conditional likelihood. The accuracy in predicting financial distress of the first naïve Bayes model's performance that is based upon conditional correlation is 90% and the accuracy of the second naïve Bayes model is 93%. Collectively, the results showed that the performance of the second naïve Bayes model that based upon conditional likelihood is better than the first one. Further analyses also showed that the number of states chosen for discretization has effect on models’ performance. In comparing the model's performance when continuous variables are discretized into two, three, four and five states, the results showed that the naïve Bayes model's performance increases when the number of states for discretization increases from two to three, and from three to four but when the number of states increases from four to five the model's performance decreased.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining

As a probability-based statistical classification method, the Naïve Bayesian classifier has gained wide popularity despite its assumption that attributes are conditionally mutually independent given the class label. Improving the predictive accuracy and achieving dimensionality reduction for statistical classifiers has been an active research area in datamining. Our experimental results suggest...

متن کامل

An Evolutionary Multi-objective Discretization based on Normalized Cut

Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operat...

متن کامل

Optimized Intrusion Detection by CACC Discretization Via Naïve Bayes and K-Means Clustering

Network Intrusion Detection System (IDS), as the main security defending technique, is second guard for a network after firewall. Data mining technology is applied to the network intrusion detection, and Precision of the detection will be improved by the superiority of data mining. For IDS many machine learning approaches are ad-acute but they all work efficiently on basis of the training data ...

متن کامل

A New Hybrid Approach for Network Traffic Classification Using Svm and Naïve Bayes Algorithm

Traffic classification is an automated process which categorizes computer network traffic according to various parameters into a number of traffic classes. Many supervised classification algorithms and unsupervised clustering algorithms have been applied to categorize Internet traffic. Traditional traffic classification methods include the port-based prediction methods and payloadbased deep ins...

متن کامل

Naïve Bayes Classifier with Various Smoothing Techniques for Text Documents

Due to huge amount of increase in text data, its classification has become an important issue, now days. There are many good classification techniques discussed in this paper. Each classification method has its own assumptions, advantages and limitations. One of the most widely used classifier is Naïve Bayes which performs well with different data sets. Various Smoothing techniques are applied ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013